Apparatus and method for optimizing quantized machine-learning algorithm

ABSTRACT

Disclosed herein are an apparatus and method for optimizing a quantized machine-learning algorithm. The apparatus includes one or more processors and executable memory for storing at least one program executed by the one or more processors. The at least one program sets the learning rate of the quantized machine-learning algorithm using at least one of an Armijo rule and golden search methods, calculates a quantized orthogonal compensation search vector from the search direction vector of the quantized machine-learning algorithm, compensates for the search performance of the quantized machine-learning algorithm using the quantized orthogonal compensation search vector, and calculates an optimized quantized machine-learning algorithm using the learning rate and the quantized machine-learning algorithm, the search performance of which is compensated for.

CROSS REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of Korean Patent Applications No.10-2019-0138793, filed Nov. 1, 2019, and No. 10-2020-0046724, filed Apr.17, 2020, which are hereby incorporated by reference in their entiretiesinto this application.

BACKGROUND OF THE INVENTION 1. Technical Field

The present invention relates generally to machine-learning andsignal-processing technology, and more particularly to technology foroptimizing a quantized machine-learning algorithm.

2. Description of the Related Art

In conventional machine-learning and nonlinear-signal-processingtechnology, a signal-processing operation is performed based onfloating-point operations. However, the conventional machine-learningand nonlinear-signal-processing technology is not suitable for fields inwhich small and lightweight hardware is required because such technologyuses multiple operation modules in order to provide real-time operationand because the size and complexity of a computation module provided forfloating-point operations are greater than the size and complexity of acomputation module provided for integer operations.

Accordingly, research on quantization of processing data is underway invarious engineering fields. Quantization of processing data may confermany advantages, such as a decrease in the number of bits, calculationspeed improvement, and availability improvement, when an engineeringsolution is implemented. For example, because a learning equation of aquantized domain is implemented by updating the least significant bit ofa parameter, it is the same as applying a fixed learning speed to anupdate parameter. However, a general stochastic steepest-descentalgorithm having a fixed step size cannot avoid performance degradationbecause of convergence on the optimum point in weak topology, such asconvergence with first-order distribution.

Meanwhile, Korean Patent Application Publication No. 10-2018-0043154,titled “Method and apparatus for neural network quantization” relates toa neural network quantization method, and discloses a method forquantizing the parameters of a neural network, which includesdetermining the diagonals of a second-order partial derivative matrix(Hessian matrix) of the loss function of the network parameters of theneural network and assigning Hessian weights to the network parametersusing the determined diagonals as part of the step of quantizing thenetwork parameters.

SUMMARY OF THE INVENTION

An object of the present invention is to implement an optimizationalgorithm capable of minimizing a quantization error in machine-learningand nonlinear-signal-processing fields using quantization and exhibitingexcellent performance on lightweight hardware.

Another object of the present invention is to implement amachine-learning algorithm capable of providing sufficient optimizationperformance even on low-performance hardware.

In order to accomplish the above objects, an apparatus for optimizing aquantized machine-learning algorithm according to an embodiment of thepresent invention may include one or more processors and executablememory for storing at least one program executed by the one or moreprocessors. The at least one program may set the learning rate of thequantized machine-learning algorithm using at least one of an Armijorule and golden search methods, calculate a quantized orthogonalcompensation search vector from the search direction vector of thequantized machine-learning algorithm, compensate for the searchperformance of the quantized machine-learning algorithm using thequantized orthogonal compensation search vector, and calculate anoptimized quantized machine-learning algorithm using the learning rateand the quantized machine-learning algorithm, the search performance ofwhich is compensated for.

Here, the at least one program may set the learning rate through alearning-rate-setting function predefined by the Armijo rule using thegradient vector of the objective function of the search directionvector.

Here, the at least one program may set any one of a first candidatevalue, acquired by increasing the minimum candidate value of thelearning rate by a golden ratio, and a second candidate value, acquiredby decreasing the maximum candidate value of the learning rate by thegolden ratio, as the learning rate.

Here, the at least one program may set any one of the first candidatevalue and the second candidate value as the learning rate when thedifference value between the first candidate value and the secondcandidate value is equal to or less than a preset value.

Here, the at least one program may select a vector in the directionorthogonal to the direction opposite the largest component vector of thesearch direction vector and quantize the selected vector, therebycalculating the quantized orthogonal compensation search vector.

Here, when the solution of the quantized machine-learning algorithm isnot able to escape from a local minimum point, the at least one programmay make the solution escape from the local minimum point using thequantized orthogonal compensation search vector.

Also, in order to accomplish the above objects, a method for optimizinga quantized machine-learning algorithm, performed by an apparatus foroptimizing the quantized machine-learning algorithm, according to anembodiment of the present invention may include setting the learningrate of the quantized machine-learning algorithm using at least one ofan Armijo rule and golden search methods; calculating a quantizedorthogonal compensation search vector from the search direction vectorof the quantized machine-learning algorithm and compensating for thesearch performance of the quantized machine-learning algorithm using thequantized orthogonal compensation search vector; and calculating anoptimized quantized machine-learning algorithm using the learning rateand the quantized machine-learning algorithm, the search performance ofwhich is compensated for.

Here, setting the learning rate may be configured to set the learningrate through a learning-rate-setting function predefined by the Armijorule using the gradient vector of the objective function of the searchdirection vector.

Here, setting the learning rate may be configured to set any one of afirst candidate value, acquired by increasing the minimum candidatevalue of the learning rate by a golden ratio, and a second candidatevalue, acquired by decreasing the maximum candidate value of thelearning rate by the golden ratio, as the learning rate.

Here, setting the learning rate may be configured to set any one of thefirst candidate value and the second candidate value as the learningrate when the difference value between the first candidate value and thesecond candidate value is equal to or less than a preset value.

Here, compensating for the search performance may be configured toselect a vector in the direction orthogonal to the direction oppositethe largest component vector of the search direction vector and toquantize the selected vector, thereby calculating the quantizedorthogonal compensation search vector.

Here, compensating for the search performance may be configured suchthat, when the solution of the quantized machine-learning algorithm isnot able to escape from a local minimum point, the solution is made toescape from the local minimum point using the quantized orthogonalcompensation search vector.

BRIEF DESCRIPTION OF THE DRAWINGS

The above and other objects, features and advantages of the presentinvention will be more clearly understood from the following detaileddescription, taken in conjunction with the accompanying drawings, inwhich:

FIG. 1 is a flowchart illustrating a method for optimizing a quantizedmachine-learning algorithm according to an embodiment of the presentinvention;

FIG. 2 is a flowchart illustrating an optimization algorithm using aquantized Armijo rule according to an embodiment of the presentinvention;

FIG. 3 is a flowchart illustrating an optimization algorithm using agolden search according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating an optimization algorithm using acompensated search vector according to an embodiment of the presentinvention; and

FIG. 5 is a view illustrating a computer system according to anembodiment of the present invention.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

The present invention will be described in detail below with referenceto the accompanying drawings. Repeated descriptions and descriptions ofknown functions and configurations that have been deemed tounnecessarily obscure the gist of the present invention will be omittedbelow. The embodiments of the present invention are intended to fullydescribe the present invention to a person having ordinary knowledge inthe art to which the present invention pertains. Accordingly, theshapes, sizes, etc. of components in the drawings may be exaggerated inorder to make the description clearer.

Hereinafter, a preferred embodiment of the present invention will bedescribed in detail with reference to the accompanying drawings.

FIG. 1 is a flowchart illustrating a method for optimizing a quantizedmachine-learning algorithm according to an embodiment of the presentinvention.

Table 1 shows mathematical symbols used for explaining a quantizedmachine-learning algorithm according to an embodiment of the presentinvention.

TABLE 1 mathematical symbol description R real number space Z integerspace N natural number space, 0 is not included R^(n) n-dimensionalvector space of real numbers Z^(n) n-dimensional vector space ofintegers R(a, b] {x|∀x ∈ R, a < x ≤ b} R[a, b) {x|∀x ∈ R, a ≤ x < b}Z[a, b] {x|∀x ∈ Z, a ≤ x ≤ b} Z(a, b) {x|∀x ∈ Z, a < x < b} δ(x)Dirac-Delta function, if x = 0, δ(x) = 1, and if x ≠ 0, δ(x) = 0

 _(Q) _(p) x average value of x based on probability distribution forquantization error caused by quantization coefficient Q_(p)

The definition of learning quantization and main quantization operationsaccording to an embodiment of the present invention may be described asfollows.

First, in order to define quantization of a variable x∈R, a round-offoperation for obtaining an integer may be defined as shown in Equation(1).

x≡└x┘+ϵ (ϵ∈R[0,1))   (1)

In Equation (1), the symbol └x┘∈Z indicates a round-off operation, whichmay be defined as the integer that is nearest to x, among integers lessthan x. Using this, the Gauss symbol may be defined as shown in Equation(2).

[x]+└x+0.5┘=x+0.5−∇

x+ϵ   (2)

When the operation of rounding x to the nearest number is defined as [x]using Equation (2), the round-off error, ϵ, may be defined as ϵ∈R(−0.5,0.5┘. Therefore, when an arbitrary integer n E Z is given, therelationships of addition and multiplication may be represented as shownin Equation (3).

[x+n]=└x+n+0.5┘=└x+0.5+n┘=└x+0.5┘+n=[x]+n[n·[x]]=└n−└x+0.5┘+0.5┘=n·└x+0.5┘=n·[x],[n·x]≠n[x]   (3)

Also, when an arbitrary real-number sequence {x_(k)} (∀k∈N, x_(k)∈R) isgiven, if the round-off operation of each x_(k) is represented usingϵ_(k) as in Equation (1), Equation (4) may be obtained.

$\begin{matrix}{\left\lfloor {\sum\limits_{k = 1}^{n}x_{k}} \right\rfloor = {\left\lfloor {\sum\limits_{k = 1}^{n}\left( {\left\lfloor x_{k} \right\rfloor + \epsilon_{k}} \right)} \right\rfloor = {{\sum\limits_{k = 1}^{n}\left\lfloor x_{k} \right\rfloor} + \left\lfloor {\sum\limits_{k = 1}^{n}\epsilon_{k}} \right\rfloor}}} & (4)\end{matrix}$

Using Equations (1) and (2), a quantization operation may be defined asshown in Equation (5).

$\begin{matrix}{x^{Q}\overset{\Delta}{=}{\frac{1}{Q_{p}}\left\lfloor {Q_{p} \cdot \left( {x + 0.5} \right)} \right\rfloor}} & (5)\end{matrix}$

In Equation (5), Q_(p) denotes a quantization coefficient, which maydetermine the level of quantization. For example, when quantization to afixed-point number at the level of 10⁻³ is attempted, Q_(p)=10³ may beset. For convenience, the quantization coefficient may be set to apositive integer (Q_(p)∈Z, Q_(p)>0). In order to represent quantizationof a specific number without the Gauss symbol, a round-off error isused, whereby Equation (5) may be replaced with Equation (6).

$\begin{matrix}{x^{Q} = {{\frac{1}{Q_{p}}\left\lbrack {Q_{p} \cdot x} \right\rbrack} = {{\frac{1}{Q_{p}}\left\lfloor {{Qp} \cdot \left( {x + 0.5} \right)} \right\rfloor} = {{\frac{1}{Qp}\left( {{Qp} \cdot \left( {x + ɛ} \right)} \right)} = {x + ɛ}}}}} & (6)\end{matrix}$

In Equation (6), x^(Q) satisfies x^(Q)∈R, but Q_(p)x^(Q)=[Q_(p)·x]∈Z.That is, in the case of x^(Q), quantization may be performed to obtain afixed-point number format, rather than simply an integer.

In Equation (6), the range of the round-off error e may be representedas ε∈R(−0.5Q_(p) ⁻¹,0.5Q_(p) ⁻¹]=R(−5·(10Q_(p))⁻¹,5·(10Q_(p))⁻¹]. IfQ_(p) is large enough that the average value for the distribution of theround-off error depending on Q_(p) satisfies E_(Q) _(p) ε=0, therelationship shown in Equation (7) may be satisfied for x_(k), which isthe sample data of x.

$\begin{matrix}{\begin{matrix}{{{\mathbb{E}}_{Q_{p}}x^{Q}}\overset{\Delta}{=}{{\lim\limits_{N\rightarrow\infty}{\frac{1}{N}{\sum\limits_{k = 0}^{N - 1}x_{k}^{Q}}}} = {\lim\limits_{N\rightarrow\infty}{\frac{1}{N}{\sum\limits_{k = 0}^{N - 1}\left( {x_{k} + ɛ_{k}} \right)}}}}} \\{= {{{{\mathbb{E}}_{Q_{p}}x} + {{\mathbb{E}}_{Q_{p}}ɛ}} = {{\mathbb{E}}_{Q_{p}}{x.}}}}\end{matrix}\quad} & (7)\end{matrix}$

The addition and multiplication of quantized values have thecharacteristics shown in Equations (4) and (5). However, because theresult of division may not be represented as a quantized value, thefollowing method is applied. That is, when division of two integers xand a (x, a∈z) is represented using a quotient and a remainder, divisionof x by a may be represented as shown in Equation (8).

$\begin{matrix}{\frac{x}{a} = {\left\lfloor \frac{x}{a} \right\rfloor + {\frac{1}{a}\left( {x - {a\left\lfloor \frac{x}{a} \right\rfloor}} \right)}}} & (8)\end{matrix}$

In Equation (8), the quotient may be

$\left\lfloor \frac{x}{a} \right\rfloor,$

and the remainder may be

$x - {a{\left\lfloor \frac{x}{a} \right\rfloor.}}$

When the Gaussian operation is applied to Equation in order to prove thequotient and the remainder, Equation (9) may be obtained.

$\begin{matrix}{\begin{matrix}{\left\lfloor \frac{x}{a} \right\rfloor = \left\lfloor {\left\lfloor \frac{x}{a} \right\rfloor + {\frac{1}{a}\left( {x - {a\left\lfloor \frac{x}{a} \right\rfloor}} \right)}} \right\rfloor} \\{= {\left\lfloor \frac{x}{a} \right\rfloor + \left\lfloor {\frac{1}{a}\left( {x - {a\left\lfloor \frac{x}{a} \right\rfloor}} \right)} \right\rfloor}}\end{matrix}\quad} & (9)\end{matrix}$

Equation (9) shows that

$\left( {x - {a\left\lfloor \frac{x}{a} \right\rfloor}} \right) < a$

is satisfied.

Meanwhile, when the rounding operation is applied to Equation (8),Equation (10) may be obtained.

$\begin{matrix}{\begin{matrix}{\left\lbrack \frac{x}{a} \right\rbrack = \left\lfloor {\frac{x}{a} + 0.5} \right\rfloor} \\{= \left\lfloor {\left\lfloor \frac{x}{a} \right\rfloor + {\frac{1}{a}\left( {x - {a\left\lfloor \frac{x}{a} \right\rfloor}} \right)} + \frac{a}{2a}} \right\rfloor} \\{= {\left\lfloor \frac{x}{a} \right\rfloor + \left\lfloor {\frac{1}{a}\left( {x - {a\left( {\left\lfloor \frac{x}{a} \right\rfloor - \frac{1}{2}} \right)}} \right)} \right\rfloor}}\end{matrix}\quad} & (10)\end{matrix}$

As a result the conditional expression shown in Equation 11 ma beobtained.

$\begin{matrix}{\left\lbrack \frac{x}{a} \right\rbrack = \left\{ \begin{matrix}\left\lfloor \frac{x}{a} \right\rfloor & \left. \left. {x < {a\left( {\left\lfloor \frac{x}{a} \right\rfloor + \frac{1}{2}} \right\rfloor}} \right)\Rightarrow{{a \cdot ɛ} > 0} \right. \\{\left\lfloor \frac{x}{a} \right\rfloor + 1} & \left. \left. {x \geq {a\left( {\left\lfloor \frac{x}{a} \right\rfloor + \frac{1}{2}} \right\rfloor}} \right)\Rightarrow{{a \cdot ɛ} \leq 0} \right.\end{matrix} \right.} & (11)\end{matrix}$

In Equation (11), the condition that the number added to

$\left\lfloor \frac{x}{a} \right\rfloor$

becomes 1 or 0 may be represented as shown in Equation (12). In orderfor the number to become 1 based on the definition of the Gauss symbol,the condition shown in Equation (12) should be satisfied.

$\begin{matrix}{{x - {a\left( {\left\lfloor \frac{x}{a} \right\rfloor - 0.5} \right)}} \geq a} & (12) \\\left. \Rightarrow{x \geq {{a \cdot \left\lfloor \frac{x}{a} \right\rfloor} + {0.5a}}} \right. & \; \\{{\left. \Rightarrow{x \geq {{a \cdot \left( {\frac{x}{a} - \epsilon} \right)} + {0.5a}}} \right.\mspace{14mu}\because{{by}\mspace{14mu}\left( {{{eq}\; 01}:\sec_{01}} \right)\mspace{14mu} x}} = {\left\lfloor x \right\rfloor + \epsilon}} & \; \\{{\left. \Rightarrow{x \geq {x - {0.5a} + \epsilon + {0.5a}}} \right.\mspace{14mu}\because{{by}\mspace{14mu}\left( {{{eq}\; 02}:\sec_{01}} \right)\mspace{14mu} ɛ}} = {0.5 - \epsilon}} & \; \\\left. \Rightarrow{{a\; ɛ} \leq 0} \right. & \;\end{matrix}$

Particularly, when

${\frac{x}{a}} < 1$

is satisfied because |x|<|a| is satisfied, this may be expanded to

${{Q_{p}{\frac{x}{a}}} < Q_{p}},$

in which case the operation may be represented as shown in Equation(13).

$\begin{matrix}{{Q_{p} \cdot \frac{x}{a}} = {\left\lfloor \frac{Q_{p}x}{a} \right\rfloor + {\frac{1}{a}\left( {{Q_{p}x} - {a\left\lfloor \frac{Q_{p}x}{a} \right\rfloor}} \right)}}} & (13)\end{matrix}$

Through Equations (5) and (13), quantization of division may be derivedas Equation (14). First, the function, g(x,a)∈{0,1}, may be defined asshown in Equation (14).

$\begin{matrix}{{g\left( {x,a} \right)} = {\left\lfloor {\frac{1}{a}\left( {{Q_{p}x} - {a\left( {\left\lfloor \frac{Q_{p}x}{a} \right\rfloor - \frac{1}{2}} \right)}} \right)} \right\rfloor = \left\{ \begin{matrix}1 & {{a \cdot ɛ} \leq 0} \\0 & {{a \cdot ɛ} > 0}\end{matrix} \right.}} & (14)\end{matrix}$

When quantization for division is interpreted using Equation (14), thismay be represented as shown in Equation (15):

$\begin{matrix}{\begin{matrix}{{\frac{1}{Q_{p}}\left\lbrack {Q_{p} \cdot \frac{x}{a}} \right\rbrack} = {\frac{1}{Q_{p}}\left\lfloor {\left\lfloor \frac{Q_{p}x}{a} \right\rfloor + {\frac{1}{a}\left( {{Q_{p}x} - {a\left( {\left\lfloor \frac{Q_{p}x}{a} \right\rfloor - \frac{1}{2}} \right)}} \right)}} \right\rfloor}} \\{= {{\frac{1}{Q_{p}}\left\lfloor \frac{Q_{p}x}{a} \right\rfloor} + {\frac{1}{Q_{p}}\left\lfloor {\frac{1}{a}\left( {{Q_{p}x} - {a\left( {\left\lfloor \frac{Q_{p}x}{a} \right\rfloor - \frac{1}{2}} \right)}} \right)} \right\rfloor}}} \\{= {\frac{1}{Q_{p}}\left( {\left\lfloor \frac{Q_{p}x}{a} \right\rfloor + {g\left( {x,a} \right)}} \right)}}\end{matrix}\quad} & (15)\end{matrix}$

Based on the definition of learning quantization and main quantizationoperations according to an embodiment of the present invention, it maybe understood that, when quantization based on the distribution of aremainder value is performed, the smallest value is 1 or 0.

Referring to FIG. 1, in the method for optimizing a quantizedmachine-learning algorithm according to an embodiment of the presentinvention, a learning rate may be set at step S10.

When the above-described quantization method is expanded to a vector,this may be defined in such a way that quantization is applied to theelements of each vector. At step S10, using the method of applyingquantization to the elements of each vector, the quantizationcharacteristics of a learning equation may be analyzed.

For example, it may be assumed that the learning equation shown inEquation (16) is given for x(t)∈R^(n).

x _(t+1) =x _(t)−λ_(t) h _(t)   (16)

In Equation (16), h_(t)∈R^(n) is a search direction vector, λ_(t)∈R(0,1]is a learning rate, and t∈R is a parameter related to time. In Equation(16), when quantization of x_(t) is defined as x_(t) ^(Q)≡[x]_(t), thequantized learning equation shown in Equation (17) may be defined.

x _(t+1) ^(Q)=(x _(t)−λ_(t) h _(t))^(Q)   (17)

In Equation (17), λ_(t) denotes a learning rate. Using h_(t), x_(t) orx_(t) ^(Q) may be updated to x_(t+1) ^(Q) through the update term ofEquation (18).

λ_(t) h _(t)=(λ_(t) h _(t))^(Q)   (18)

Additionally, it is assumed that the following basic condition for thelearning rate is satisfied.

λ_(i)=arg min ƒ(x _(i) −λh _(i))   (19)

Here, when Equation (17), which is a quantized learning equation, isrewritten by substituting the quantized x_(t) ^(Q) for x_(t) and byapplying Equation (18), which is a quantized update term, Equation (20)may be obtained.

x _(t+1) ^(Q)=(x _(t) ^(Q)−(λ_(t) h _(t))^(Q))^(Q) =x _(t) ^(Q)−(λ_(t) h_(t))^(Q)   (20)

In Equation (20), because the learning rate λ_(t)∈R and the searchdirection vector h_(t)∈R^(n) are a scalar and a vector, respectively, itis relatively difficult to implement quantization satisfyingλ_(t)h_(t)=(λ_(t)h_(t))^(Q). Therefore, even though the optimizedlearning rate is found through a line search algorithm, it is requiredto recalculate it so as to enable quantization.

The most intuitive quantization of the update term is to setλ_(t)=1/Q_(p). As the result thereof,

${\lambda_{t}h_{t}} = {\frac{1}{Q_{p}}h_{t}}$

is satisfied, and the quantized update term may be represented as shownin Equation (21).

$\begin{matrix}{\left( {\lambda_{t}h_{t}} \right)^{Q} = {{\frac{1}{Q_{p}}\left( {{Q_{p}\lambda_{t}h_{t}} + {Q_{p}ɛ}} \right)} = {{{\frac{1}{Q_{p}}h_{t}} + ɛ} = \left( {\frac{1}{Q_{p}}h_{t}} \right)^{Q}}}} & (21)\end{matrix}$

Based on this, the function z(k_(t))∈Z, z(k_(t))>0, which outputs anarbitrary positive integer for the internal iteration index k_(t) in thet-th iteration, is defined, and using this function, the learning rateλ_(t) is set to λ_(t)=z(k_(t))Q_(p) ⁻¹. Accordingly, the update term maybe represented as shown in Equation (22).

$\begin{matrix}{\left( {\lambda_{t}h_{t}} \right)^{Q} = {{{\frac{z\left( k_{t} \right)}{Q_{p}}h_{t}} + ɛ} = \left( {\frac{z\left( k_{t} \right)}{Q_{p}}h_{t}} \right)^{Q}}} & (22)\end{matrix}$

If quantization is defined well so as to satisfy

${\frac{1}{Q_{p}}h_{t}} \in Z$

by appropriately defining Equation (22) (or if quantization can bedefined through the internal characteristics of a calculator), becausez(k_(t))∈Z is satisfied, Equation (23) may be obtained.

$\begin{matrix}{\left( {\frac{z\left( k_{t} \right)}{Q_{p}}h_{t}} \right)^{Q} = {{z\left( k_{t} \right)}\left( {\frac{1}{Q_{p}}h_{t}} \right)^{Q}}} & (23)\end{matrix}$

In Equation (23), z(k_(t))∈Z(0,Q_(p)) is satisfied. When thewell-defined

${\frac{1}{Q_{p}}h_{t}} \in Z$

is assumed to be a basic quantization search direction vector, Equation(24) may be defined.

$\begin{matrix}{{h_{t}^{Q}\overset{\Delta}{=}\left( {\frac{1}{Q_{p}}h_{t}} \right)^{Q}},{h_{t}^{Q\;} \in Z}} & (24)\end{matrix}$

When Equation (22) is rewritten using Equation (24), Equation (25) maybe obtained.

(λ_(t) h _(t))^(Q) =z(k _(t))h _(t) ^(Q)   (25)

That is, at step S10, the learning rate of the quantizedmachine-learning algorithm may be set using at least one of an Armijorule and golden search methods.

Here, at step S10, using the gradient vector of the objective functionof the search direction vector, the learning rate may be set based on alearning-rate-setting function predefined by the Armijo rule.

Here, at step S10, any one of a first candidate value, which is acquiredby increasing the minimum candidate value of the learning rate by agolden ratio, and a second candidate value, which is acquired bydecreasing the maximum candidate value of the learning rate by thegolden ratio, may be set as the learning rate.

Here, at step S10, when the difference value between the first candidatevalue and the second candidate value is equal to or less than a presetvalue, any one of the first candidate value and the second candidatevalue may be set as the learning rate.

Also, in the method for optimizing a quantized machine-learningalgorithm according to an embodiment of the present invention,quantization search performance may be compensated for at step S20.

That is, at step S20, a quantized orthogonal compensation search vectormay be calculated from the search direction vector of the quantizedmachine-learning algorithm, and the search performance of the quantizedmachine-learning algorithm may be compensated for using the quantizedorthogonal compensation search vector.

Here, at step S20, a vector in the direction orthogonal to the directionopposite the largest component vector of the search direction vector isselected, and the selected vector is quantized, whereby the quantizedorthogonal compensation search vector may be calculated.

Here, at step S20, when the solution of the quantized machine-learningalgorithm is not able to escape from a local minimum point, the solutionmay be made to escape from the local minimum point using the quantizedorthogonal compensation search vector.

Also, in the method for optimizing a quantized machine-learningalgorithm according to an embodiment of the present invention, anoptimized learning algorithm may be calculated at step S30.

That is, at step S30, the optimized quantized machine-learning algorithmmay be calculated using the learning rate and the quantizedmachine-learning algorithm, the search performance of which iscompensated for.

FIG. 2 is a flowchart illustrating an optimization algorithm using aquantized Armijo rule according to an embodiment of the presentinvention.

Referring to FIG. 2, the process of setting the learning rate of aquantized machine-learning algorithm to which the Armijo rule is appliedat step S10 is specifically illustrated.

First, at step S110, a quantization parameter may be set.

Also, at step S120, a learning-rate-setting parameter may be set.

Also, at step S130, a search direction vector may be calculated.

The search direction vector h_(t) is the gradient vector of an objectivefunction ƒ(x) and is represented as h_(t)=∇ƒ(x_(t)), and the functionfor setting the learning rate may be represented as shown in Equation(26).

ϕ(β^(k))=β^(k) α∥h _(t)∥²   (26)

Also, at step S140, when the error of the search direction vector isequal to or less than a preset value, the optimization of the quantizedmachine-learning algorithm is terminated, but when the error is greaterthan the preset value, the learning rate may be set at step S150.

At step S150, the learning rate may be set using Equation (27) to whichthe Armijo rule is applied.

$\begin{matrix}{\lambda_{t}\overset{\Delta}{=}{\underset{k \in N}{argmax}\left\lbrack {{\beta^{k}\left. {{{{f\left( {x_{t} + {\beta^{k}h_{t}}} \right)} - {f\left( x_{t} \right)}} \leq {{- \beta^{k}}\alpha{h_{t}}^{2}}} = {\phi\left( \beta^{k} \right)}} \right\rbrack\mspace{14mu}\alpha},{\beta \in {R\left( {0,1} \right)}}} \right.}} & (27)\end{matrix}$

Finally, the quantization parameter may be configured with positiveintegers, as shown in Equation (28).

Q _(p)=η·ρ^(n) η,ρ, n∈Z ⁺⁺   (28)

Because λ_(t)=ϕ(β^(k)) is satisfied in the definition of the learningrate, the Armijo rule may be modified so as to satisfy thecharacteristics of the quantized learning rate in Equation (23). First,assuming that the search direction vector is the basic quantizationsearch direction vector and is represented as h_(t)

(h_(t))^(Q)=Q_(p)h_(t) ^(Q), Equation (29) may be obtained.

$\begin{matrix}{\begin{matrix}{{\phi\left( \beta^{k} \right)} = {{- \beta^{k}}\alpha{h_{t}}^{2}}} \\{= {{- \beta^{k}}\alpha{\left( h_{t} \right)^{Q}}^{2}}} \\{= {{- \beta^{k}}\alpha{{Q_{p}h_{t}^{Q}}}^{2}}} \\{{= {{{- \beta^{k}}{\alpha Q}_{p}^{2}{h_{t}^{Q}}^{2}}\mspace{14mu}\because{Q_{p} \in Z}}},{Q_{p} > 0}} \\{= {{{- \left( {\beta^{k}Q_{p}} \right)} \cdot \left( {\alpha Q}_{p} \right)}{h_{t}^{Q}}^{2}}} \\{= {{{- \left( {\beta^{k}{\eta\rho}^{n}} \right)} \cdot \left( {\alpha\;{\eta\rho}^{n}} \right)}{h_{t}^{Q}}^{2}}} \\{= {{{- \left( {\beta^{k}\rho^{n}} \right)} \cdot \left( {\alpha\;\eta^{2}\rho^{n}} \right)}{h_{t}^{Q}}^{2}}}\end{matrix}\quad} & (29)\end{matrix}$

Equation (29) may be solved as shown in Equation (30).

$\begin{matrix}{{{\phi\left( {\overset{\_}{\beta}}^{\overset{\_}{k}} \right)} \equiv {{{- \left( {\overset{\_}{\beta}}^{\overset{\_}{k}} \right)} \cdot \overset{\_}{\alpha}}{h_{t}^{Q}}^{2}}},{0 < {\overset{\_}{\beta}}^{\max\overset{\_}{k}} \leq \frac{Q_{p}}{\eta}},{0 < \overset{\_}{\alpha} \leq {\eta \cdot Q_{p}}}} & (30)\end{matrix}$

Because α and β are arbitrary values satisfying α,β∈R(0,1), an integerthat is equal to or less than η is taken (that is, ζ≤η) for α inEquation (30), whereby the condition for a may be satisfied as shown inEquation (31).

$\begin{matrix}{{\overset{\_}{\alpha} = {{\zeta \cdot Q_{p}} \in Z}},{{{\eta < \overset{\_}{\alpha} \leq {\eta \cdot Q_{p}}}\mspace{14mu}\because\alpha} = \frac{\zeta}{\eta}}} & (31)\end{matrix}$

Also, when β=ρ⁻¹ is set, β may be represented as shown in Equation (32)for the fixed value n.

$\begin{matrix}{{{\overset{\_}{\beta}}^{\overset{\_}{k}} = {{\beta^{k}\rho^{n}} = {{\rho^{- k}\rho^{n}} = p^{n - k}}}},{0 < \overset{\_}{k} \leq n}} & (32)\end{matrix}$

Therefore, β and k are defined as β=ρ, k=n−k, and because max k=n issatisfied, Equation (33) may be obtained.

$\begin{matrix}{{\overset{\_}{\beta}}^{\max\overset{\_}{k}} = {\rho^{n} = \frac{Q_{p}}{\eta}}} & (33)\end{matrix}$

Therefore, when the objective function ƒ:R^(n)→R is configured so as tosatisfy ƒ:Z^(n)→Z for z^(n) and when h₀ is configured so as to satisfyh₀=−∇ƒ(x₀)∈Z, the learning rate λ_(t) ^(Q) based on the Armijo rule maybe calculated using only the integer operations shown in Equation (34),and may then be applied to the quantized machine-learning algorithm atstep S150.

$\begin{matrix}{\lambda_{t}^{Q}\overset{\Delta}{=}{\underset{\overset{\_}{k} \in N}{argmax}\left\{ {{\overset{\_}{\beta}}^{\overset{\_}{k}}\left. {{{{f\left( {x_{t} + {{\overset{\_}{\beta}}^{\overset{\_}{k}}h_{t}}} \right)} - {f\left( x_{t} \right)}} \leq {{- {\overset{\_}{\beta}}^{\overset{\_}{k}}}\overset{\_}{\alpha}{h_{t}^{Q}}^{2}}} = {\phi\left( {\overset{\_}{\beta}}^{\overset{\_}{k}} \right)}} \right\}} \right.}} & (34)\end{matrix}$

Also, at step S160, the quantized learning equation may be updated usingthe set learning rate.

FIG. 3 is a flowchart illustrating an optimization algorithm using agolden search according to an embodiment of the present invention.

Referring to FIG. 3, the process of setting the learning rate of aquantized machine-learning algorithm to which a golden search method isapplied at step S10 is specifically illustrated.

The golden search method is a method for finding the optimized learningrate, like the Armijo rule and a learning rate satisfying Equation 35may be set.

$\begin{matrix}{\left. {\min\limits_{\lambda \geq 0}\left\{ {f\left( {x_{i} + {\lambda\; h_{i}}} \right)} \right\}}\Rightarrow{{Let}\mspace{14mu}{\phi(\lambda)}} \right. = \left. {{f\left( {x_{i} + {\lambda\; h_{i}}} \right)} - {f\left( x_{i} \right)}}\Rightarrow{\min\limits_{\lambda}{\phi(\lambda)}} \right.} & (35)\end{matrix}$

First, at step S210, a quantization parameter may be set.

Also, at step S220, a search range for applying the golden search methodmay be set.

Also, at step S230, an initial condition for applying the golden searchmethod may be set.

Here, at step S230, when quantization is not applied, a₀=0 and b₀=1 maybe set and h_(t)=Q_(p)h_(t) ^(Q) may be set.

Also, at step S240, the golden search method may be applied.

Here, at step S240, the i-th search range 1, may be defined as shown inEquation (36).

l _(i)

b _(i) −a _(i)   (36)

Here, a_(i) and b_(i) may be the minimum candidate value and the maximumcandidate value for the learning rate A.

Here, at step S240, when the value of the search range 1, is made toapproach 0 by increasing the minimum candidate value by a golden ratioand decreasing the maximum candidate value by the golden ratio using thegolden search method, one of the minimum candidate value and the maximumcandidate value may be set as the learning rate.

The golden ratio used for the golden search may be F₀=0.618.

Also, at steps S250 to S280, the minimum candidate value and the maximumcandidate value may be updated.

Here, the update of the minimum candidate value and the maximumcandidate value may be represented as shown in Equation (37).

a′ _(i) =a _(i)+(1−F ₀)l _(i) , b′ _(i) =b _(i)−(1−F ₀)l _(i)   (37)

Because a_(i) and b_(i) are set to a_(i),b_(i)∈R(0,1) in Equation (37),quantization thereof is required.

Using Equations (24) and (25), the quantized learning equation shown inEquation (38) may be calculated.

x _(t+1) ^(Q) =x _(t) ^(Q)−(λ_(t) h _(t))^(Q) =x _(t) ^(Q)−(λ_(t) Q _(p)h _(t) ^(Q))^(Q) =x _(t) ^(Q)−(λ_(t) Q _(p))^(Q) h _(t) ^(Q)   (38)

In Equation (38), because (λ_(t)Q_(p))^(Q) is a value that falls withinthe range of z(0,Q_(p)), when λ_(t) ^(Q)

(λ_(t)Q_(p)) is set, Equation (39) may be obtained.

a _(i) ^(Q)≡(a _(i) Q _(p))^(Q) , b _(i)≡(b _(i) Q _(p))^(Q) , a _(i) ,b_(i) ∈Z(0,Q _(p))   (39)

Equation (39) may be solved as a quantized operation using the goldensearch method. To this end, F=1−F₀ is set first, and the equation forthe minimum candidate in Equation (37) is multiplied by Q_(p), wherebyEquation (40) may be obtained.

a′ _(i) Q _(p) =a _(i) Q _(p) +Q _(p)(1−F ₀)l _(i)

a′ _(i) Q _(p) =a _(i) Q _(p) +Q _(p) Fl _(i)

(a′ _(i) Q _(p))^(Q)=(a _(i) Q _(p) +Q _(p) Fl _(i))^(Q)   (40)

The error resulting from quantization is taken into consideration, andbecause l_(i)=b_(i) ^(Q)−a_(i) ^(Q)∈Z is satisfied as the result ofquantization, Equation (41) may be obtained.

a′ _(i) ^(Q)=(a _(i) Q _(p))^(Q)+(Q _(p) F )^(Q) l _(i) +O(ε)   (41)

Because this is the operation for setting the learning rate, when thequantization error is ignored, Equations (39) and (41) may be solved asshown in Equation (42).

a′ _(i) ^(Q) =a _(i) ^(Q)+(Q _(p) F )^(Q) l _(i) , b′ _(i) ^(Q) =b _(i)^(Q)−(Q _(p) F )^(Q) l _(i) ∀i∈N, a _(i) ^(Q) ,b _(i) ^(Q) ∈Z(0,Q_(p))   (42)

Because the learning rate set through the above process satisfies λ_(t)^(Q) ^(L) ∈Z(0,Q_(p)), when this is applied to the quantization unitsearch vector h_(t) ^(Q), the following quantized learning equation maybe obtained.

x _(t+1) ^(Q) =x _(t) ^(Q)−λ_(t) ^(Q) ^(L) h _(t) ^(Q)   (43)

FIG. 4 is a flowchart illustrating an optimization algorithm using acompensated search vector according to an embodiment of the presentinvention.

Referring to FIG. 4, the process of compensating for the searchperformance of a quantized machine-learning algorithm at step S20 isspecifically illustrated.

Here, at step S20, a point that makes an objective function smaller in adifferent direction other than a search direction vector is searchedfor, whereby optimization performance degradation may be overcome.

First, at step S310, a quantization parameter may be set.

Also, at step S320, an initial parameter may be set.

Also, at step S330, a search direction vector may be calculated.

Also, at step S340, when the error of the search direction vector isequal to or less than a preset value, the optimization of the quantizedmachine-learning algorithm is terminated, but when the error is greaterthan the preset value, a learning rate may be set at step S350.

At step S350, the learning rate may be set using the Armijo rule and thegolden search method, as described with reference to FIG. 2 and FIG. 3.

Also, at step S360, whether the quantized learning equation using theset learning rate reaches a local minimum point is determined. When thequantized learning equation is determined not to reach the local minimumpoint, the search direction vector may be compensated for at step S370.

Here, at step S360, with regard to performance degradation, it may bechecked whether the optimization parameter x, cannot converge near thelocal minimum point because the appropriate learning rate enablingarrival at the local minimum point cannot be applied due to quantizationor whether a better minimum point cannot be found.

For the search direction vector h_(t)∈R^(n), the quantized vector(h_(t))^(Q)=Q_(p)h_(t) ^(Q)∈R^(n) may be represented as shown inEquation (44).

$\begin{matrix}{{\left( h_{t} \right)^{Q} = {\sum\limits_{i = 0}^{n - 1}{v_{i}e_{i}}}},{\forall{i \in {Z\left\lbrack {0,{n - 1}} \right\rbrack}}},{v_{i} \in Z}} & (44)\end{matrix}$

Here, at step S370, the different direction is the direction orthogonalto the search direction. Because a vector orthogonal to the searchdirection vector may have various vector directions, the vector in thedirection orthogonal to the direction opposite the largest componentvector of the search direction vector may be selected.

In Equation (44), e_(i) is the unit orthogonal vector of Euclidian spaceR^(n) and satisfies ∀i,j∈Z[0,n), ∥e_(i)∥=1, e_(i) ^(T)e_(j)=0 i≠j.Assume that the largest component, among the components {v_(i)} of thequantized vector (h_(t))^(Q), is v_(m)=max{∥v_(i)∥}, and that the indexthereof is m=argmax_(i){∥v_(i)∥}. Here, when the vector acquired bysetting v_(m)=0 in the quantized vector (h_(t))^(Q) is v and when thevector in which all of the components excluding v_(m) are 0 is 0, v and{circumflex over (v)} may be defined as shown in Equation (45).

$\begin{matrix}{{\overset{\_}{v} = {\sum\limits_{i = 0}^{n - 1}{\left( {1 - {\delta\left( {i - m} \right)}} \right)v_{i}e_{i}}}},{\hat{v} = {\sum\limits_{i = 0}^{n - 1}{{\delta\left( {i - m} \right)}v_{i}e_{i}}}}} & (45)\end{matrix}$

Therefore, the search direction vector may be divided into twoorthogonal vectors v and {circumflex over (v)}, as shown in Equation(46).

$\begin{matrix}{\left( h_{t} \right)^{Q} = {{\overset{\_}{v} + \hat{v}} = {{\sum\limits_{i = 0}^{n - 1}{\left( {1 - {\delta\left( {i - m} \right)} + {\delta\left( {i - m} \right)}} \right)v_{i}e_{i}}} = {\sum\limits_{i = 0}^{n - 1}{v_{i}e_{i}}}}}} & (46)\end{matrix}$

At step S370, vector z may be calculated as shown in Equation (47) inorder to obtain the vector in the direction orthogonal to the largestcomponent in the existing search direction vector (h_(t))^(Q).

z _(t) =v+r·{circumflex over (v)}, r∈R   (47)

In Equation (47), r∈R is a proportional constant for {circumflex over(v)}, and through this value, the orthogonal vector z may be calculated.Using the orthogonality between the vector z and the vector (h_(t))^(Q),r may be calculated as shown in Equation (48).

$\begin{matrix}\begin{matrix}{0 = {\left\langle {\left( h_{t} \right)^{Q},z} \right\rangle = \left\langle {{\overset{\_}{v} + \hat{v}},{\overset{\_}{v} + {r\hat{v}}}} \right\rangle}} \\{= {{{{\overset{\_}{v}}^{2} + {\left( {r + 1} \right)\left\langle {\hat{v},\overset{\_}{v}} \right\rangle} + {r{\hat{v}}^{2}}}\mspace{14mu}\because\left\langle {\hat{v},\overset{\_}{v}} \right\rangle} = 0}} \\{= {{\overset{\_}{v}}^{2} + {r{\hat{v}}^{2}}}}\end{matrix} & (48) \\{{\therefore r} = {{- \frac{{\overset{\_}{v}}^{2}}{{\hat{v}}^{2}}} = {{- \frac{{\left( h_{t} \right)^{Q}}^{2} - v_{m}^{2}}{v_{m}^{2}}} = {1 - \frac{{\left( h_{t} \right)^{Q}}^{2}}{v_{m}^{2}}}}}} & \;\end{matrix}$

However, because |r|<1 is satisfied, when this value is applied to thelearning equation without change, the operation is not performed usingan integer value.

Accordingly, the compensated search vector may become a general realnumber vector, rather than a vector configured with quantized values.Therefore, at step S370, the compensated search vector may be calculatedin consideration of quantization of the proportional constant.

In Equation (47), because (h_(t))^(Q)=Q_(p)h_(t) ^(Q) is satisfied, whenthe equation is solved using v_(m)=Q_(p)v_(m) ^(Q), Equation (49) may beobtained.

$\begin{matrix}{\hat{v} = {{\sum\limits_{i = 0}^{n - 1}{{\delta\left( {i - m} \right)}v_{i}e_{i}}} = {{Q_{p} \cdot v_{m}^{Q}}e_{m}}}} & (49)\end{matrix}$

In Equation (49), when v_(m) ^(Q)e_(m)

{circumflex over (v)}^(Q) is set, Equation (50) may be obtained.

$\begin{matrix}{\begin{matrix}{z_{t} = {{\overset{\_}{v} + {r \cdot \hat{v}} + {{r \cdot Q_{p}}{\hat{v}}^{Q}}} = {\overset{\_}{v} + {Q_{p}{r \cdot {\hat{v}}^{Q}}}}}} \\{= {\overset{\_}{v} + {Q_{p}{\hat{v}}^{Q}} - {Q_{p}{\hat{v}}^{Q}} + {Q_{p}{r \cdot {\hat{v}}^{Q}}}}} \\{= {{{\left( {\overset{\_}{v} + \hat{v}} \right) + {{Q_{p}\left( {r - 1} \right)}{\hat{v}}^{Q}}}\mspace{14mu}\because\hat{v}} = {Q_{p}{\hat{v}}^{Q}}}}\end{matrix}\quad} & (50)\end{matrix}$

Based on Equations (46) and (48) and (h_(t))^(Q)=Q_(p)h_(t) ^(Q),Equation (51) may be obtained.

$\begin{matrix}{z_{t} = {{Q_{p}h_{t}^{Q}} - {Q_{p}\frac{{\left( h_{t} \right)^{Q}}^{2}}{v_{m}^{2}}{\hat{v}}^{Q}}}} & (51)\end{matrix}$

Accordingly, when the coefficient of {circumflex over (v)}^(Q) is solvedusing Equations (11) and (25), Equation (52) may be obtained.

$\begin{matrix}{{Q_{p} \cdot \frac{{\left( h_{t} \right)^{Q}}^{2}}{v_{m}^{2}}} = {{Q_{p} \cdot \frac{\sum\limits_{i = 0}^{n - 1}v_{i}^{2}}{v_{m}^{2}}} = {\sum\limits_{i = 0}^{n - 1}\frac{Q_{p}v_{i}^{2}}{v_{m}^{2}}}}} & (52)\end{matrix}$

Accordingly, when the compensated search vector z_(t) is quantized usingEquations (51) and (52), Equation (53) may be obtained.

$\begin{matrix}{\begin{matrix}{\left( z_{t} \right)^{Q} = \left( {{Q_{p}h_{t}^{Q}} - {Q_{p}\frac{{\left( h_{t} \right)^{Q}}^{2}}{v_{m}^{2}}{\hat{v}}^{Q}}} \right)^{Q}} \\{= {{Q_{p}h_{t}^{Q}} - {\left( {\sum\limits_{i = 0}^{n - 1}\frac{Q_{p}v_{i}^{2}}{v_{m}^{2}}} \right)^{Q}{\hat{v}}^{Q}}}} \\{= {{Q_{p}h_{t}^{Q}} - {\left\lfloor {{\sum\limits_{i = 0}^{n - 1}\frac{Q_{p}v_{i}^{2}}{v_{m}^{2}}} + 0.5} \right\rfloor{\hat{v}}^{Q}}}} \\{= {{Q_{p}h_{t}^{Q}} - {\left\lfloor {{\sum\limits_{i = 0}^{n - 1}\left\{ {\left\lfloor \frac{Q_{p}v_{i}^{2}}{v_{m}^{2}} \right\rfloor + {\frac{1}{v_{m}^{2}}\left( {{Q_{p}v_{t}^{2}} - {v_{m}^{2}\left\lfloor \frac{Q_{p}v_{i}^{2}}{v_{m}^{2}} \right\rfloor}} \right)}} \right\}} + 0.5} \right\rfloor{\hat{v}}^{Q}}}}\end{matrix}\quad} & (53)\end{matrix}$

Therefore, Equation (53) may be solved to Equation (54) using Equation(4).

$\begin{matrix}{\left( z_{t} \right)^{Q} = {{Q_{p}h_{t}^{Q}} - \left( {{\sum\limits_{i = 0}^{n - 1}\left\lfloor \frac{Q_{p}v_{i}^{2}}{v_{m}^{2}} \right\rfloor} + {\left\lfloor {\sum\limits_{i = 0}^{n - 1}{\frac{1}{v_{m}^{2}}\left( {{Q_{p}v_{t}^{2}} - {v_{m}^{2}\left\lfloor \frac{Q_{p}v_{i}^{2}}{v_{m}^{2}} \right\rfloor} + 0.5} \right\rfloor}} \right){\hat{v}}^{Q}}} \right.}} & (54)\end{matrix}$

In Equation (54),

$\left\lfloor \frac{Q_{p}v_{i}^{2}}{v_{m}^{2}} \right\rfloor$

and

${Q_{p}v_{i}^{2}} - {v_{m}^{2}\left\lfloor \frac{Q_{p}v_{i}^{2}}{v_{m}^{2}} \right\rfloor}$

are the quotient and remainder of

$\frac{Q_{p}v_{i}^{2}}{v_{m}^{2}}.$

The part corresponding to the remainder may be simplified as shown inEquation (55).

$\begin{matrix}{{{Rem}\left( \frac{Q_{p}v_{i}^{2}}{v_{m}^{2}} \right)} = {\frac{1}{v_{m}^{2}}\left( {{Q_{p}v_{i}^{2}} - {v_{m}^{2}\left\lfloor \frac{Q_{p}v_{i}^{2}}{v_{m}^{2}} \right\rfloor}} \right)}} & (55)\end{matrix}$

The quantized orthogonal compensation search vector may be representedas shown in Equation (56).

$\begin{matrix}{\left( z_{t} \right)^{Q} = {{Q_{p}h_{t}^{Q}} - {\left( {{\sum\limits_{i = 0}^{n - 1}\left\lfloor \frac{Q_{p}v_{i}^{2}}{v_{m}^{2}} \right\rfloor} + \left\lbrack {{Rem}\left( \frac{Q_{p}v_{i}^{2}}{v_{m}^{2}} \right)} \right\rbrack} \right){\hat{v}}^{Q}}}} & (56)\end{matrix}$

That is, at step S370, the quantized compensation search vector z_(t),which is orthogonal to the search direction vector, may be calculated.

Here, at step S370, when the quantized learning equation does not escapefrom a local minimum point using the quantized orthogonal compensationsearch vector, the quantized learning equation may be made to escapefrom the local minimum point using the quantized compensation searchvector.

Because the quantized compensation search vector has a scale of aquantization coefficient Q_(p), this may be calculated as the basicquantization compensation search vector, as shown in Equation (57).

$\begin{matrix}{z_{t}^{Q}\overset{\Delta}{=}{\frac{1}{Q_{p}}\left( z_{t} \right)^{Q}}} & (57)\end{matrix}$

Also, at step S380, the quantized learning equation may be calculated.

That is, at step S380, based on the quantized compensation search vectordefined in Equation (57), the quantized compensation search vector ismultiplied by the quantized learning rate calculated using the quantizedArmijo rule or the quantized line search algorithm, whereby thequantized learning equation shown in Equation (58) may be calculated.

The quantized learning equation shown in Equation (58) may be easilycombined with the existing machine-learning or nonlinear algorithm.

x _(t+1) ^(Q) =x _(t) ^(Q)−λ_(t) ^(Q) h _(t) ^(Q) , h _(t) ^(Q) =z _(t)^(Q)   (58)

FIG. 5 is a view illustrating a computer system according to anembodiment of the present invention.

Referring to FIG. 5, the apparatus for optimizing a quantizedmachine-learning algorithm according to an embodiment of the presentinvention may be implemented in a computer system 1100 including acomputer-readable recording medium. As illustrated in FIG. 5, thecomputer system 1100 may include one or more processors 1110, memory1130, a user-interface input device 1140, a user-interface output device1150, and storage 1160, which communicate with each other via a bus1120. Also, the computer system 1100 may further include a networkinterface 1170 connected to a network 1180. The processor 1110 may be acentral processing unit or a semiconductor device for executingprocessing instructions stored in the memory 1130 or the storage 1160.The memory 1130 and the storage 1160 may be any of various types ofvolatile or nonvolatile storage media. For example, the memory mayinclude ROM 1131 or RAM 1132.

The apparatus for optimizing a quantized machine-learning algorithmaccording to an embodiment of the present invention includes one or moreprocessors 1110 and executable memory 1130 for storing at least oneprogram executed by the one or more processors 1110. The at least oneprogram may set the learning rate of the quantized machine-learningalgorithm using at least one of an Armijo rule and golden searchmethods, calculate a quantized orthogonal compensation search vectorfrom the search direction vector of the quantized machine-learningalgorithm, compensate for the search performance of the quantizedmachine-learning algorithm using the quantized orthogonal compensationsearch vector, and calculate an optimized quantized machine-learningalgorithm using the learning rate and the quantized machine-learningalgorithm, the search performance of which is compensated for.

Here, the at least one program may set the learning rate through alearning-rate-setting function predefined by the Armijo rule using thegradient vector of the objective function of the search directionvector.

Here, the at least one program may set any one of a first candidatevalue, which is acquired by increasing the minimum candidate value ofthe learning rate by a golden ratio, and a second candidate value, whichis acquired by decreasing the maximum candidate value of the learningrate by the golden ratio, as the learning rate.

Here, when the difference value between the first candidate value andthe second candidate value is equal to or less than a preset value, theat least one program may set any one of the first candidate value andthe second candidate value as the learning rate.

Here, the at least one program may select a vector in a directionorthogonal to the direction opposite the largest component vector of thesearch direction vector and quantize the selected vector, therebycalculating the quantized orthogonal compensation search vector.

Here, when the solution of the quantized machine-learning algorithm isnot able to escape from a local minimum point, the at least one programmay make the solution escape from the local minimum point using thequantized orthogonal compensation search vector.

The present invention may implement an optimization algorithm capable ofminimizing a quantization error in machine-learning andnonlinear-signal-processing fields using quantization and exhibitingexcellent performance on lightweight hardware.

Also, the present invention may implement a machine-learning algorithmcapable of providing sufficient optimization performance even onlow-performance hardware.

As described above, the apparatus and method for optimizing a quantizedmachine-learning algorithm according to the present invention are notlimitedly applied to the configurations and operations of theabove-described embodiments, but all or some of the embodiments may beselectively combined and configured, so that the embodiments may bemodified in various ways.

What is claimed is:
 1. An apparatus for optimizing a quantizedmachine-learning algorithm, comprising: one or more processors; andexecutable memory for storing at least one program executed by the oneor more processors, wherein the at least one program sets a learningrate of the quantized machine-learning algorithm using at least one ofan Armijo rule and golden search methods, calculates a quantizedorthogonal compensation search vector from a search direction vector ofthe quantized machine-learning algorithm, compensates for searchperformance of the quantized machine-learning algorithm using thequantized orthogonal compensation search vector, and calculates anoptimized quantized machine-learning algorithm using the learning rateand the quantized machine-learning algorithm, the search performance ofwhich is compensated for.
 2. The apparatus of claim 1, wherein the atleast one program sets the learning rate through a learning-rate-settingfunction predefined by the Armijo rule using a gradient vector of anobjective function of the search direction vector.
 3. The apparatus ofclaim 1, wherein the at least one program sets any one of a firstcandidate value, acquired by increasing a minimum candidate value of thelearning rate by a golden ratio, and a second candidate value, acquiredby decreasing a maximum candidate value of the learning rate by thegolden ratio, as the learning rate.
 4. The apparatus of claim 2, whereinthe at least one program sets any one of the first candidate value andthe second candidate value as the learning rate when a difference valuebetween the first candidate value and the second candidate value isequal to or less than a preset value.
 5. The apparatus of claim 1,wherein the at least one program selects a vector in a directionorthogonal to a direction opposite a largest component vector of thesearch direction vector and quantizes the selected vector, therebycalculating the quantized orthogonal compensation search vector.
 6. Theapparatus of claim 5, wherein, when a solution of the quantizedmachine-learning algorithm is not able to escape from a local minimumpoint, the at least one program makes the solution escape from the localminimum point using the quantized orthogonal compensation search vector.7. A method for optimizing a quantized machine-learning algorithm,performed by an apparatus for optimizing the quantized machine-learningalgorithm, comprising: setting a learning rate of the quantizedmachine-learning algorithm using at least one of an Armijo rule andgolden search methods; calculating a quantized orthogonal compensationsearch vector from a search direction vector of the quantizedmachine-learning algorithm and compensating for search performance ofthe quantized machine-learning algorithm using the quantized orthogonalcompensation search vector; and calculating an optimized quantizedmachine-learning algorithm using the learning rate and the quantizedmachine-learning algorithm, the search performance of which iscompensated for.
 8. The method of claim 7, wherein setting the learningrate is configured to set the learning rate through alearning-rate-setting function predefined by the Armijo rule using agradient vector of an objective function of the search direction vector.9. The method of claim 7, wherein setting the learning rate isconfigured to set any one of a first candidate value, acquired byincreasing a minimum candidate value of the learning rate by a goldenratio, and a second candidate value, acquired by decreasing a maximumcandidate value of the learning rate by the golden ratio, as thelearning rate.
 10. The method of claim 9, wherein setting the learningrate is configured to set any one of the first candidate value and thesecond candidate value as the learning rate when a difference valuebetween the first candidate value and the second candidate value isequal to or less than a preset value.
 11. The method of claim 7, whereincompensating for the search performance is configured to select a vectorin a direction orthogonal to a direction opposite a largest componentvector of the search direction vector and to quantize the selectedvector, thereby calculating the quantized orthogonal compensation searchvector.
 12. The method of claim 11, wherein compensating for the searchperformance is configured such that, when a solution of the quantizedmachine-learning algorithm is not able to escape from a local minimumpoint, the solution is made to escape from the local minimum point usingthe quantized orthogonal compensation search vector.